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INTRODUCTION: 

Autism  Spectrum  Disorder  (ASD)  are  a  group  of  neurodevelopmental  disorders  that  are 
caused  by  a  range  of  factors,  including:  genetic,  epigenetic  and  environmental,  with  a 
genetic/epigenetic  model  proposed  (Jiang  et  ah,  2004).  While  a  main  focus  of  autism  research 
remains  on  the  genetic  causes,  more  and  more  attention  was  srawn  to  the  role  epigenetic  factors 
play,  as  it  has  been  shown  to  play  a  role  in  idiopathic  autism.  With  our  previous  published  study 
revealed  significantly  association  of  C677T  polymorphism  in  MTHFR  gene  with  idiopathic 
autism  in  Simplex  (SPX)  autism  families  (  Liu  et  ah,  201 1);  and  the  proven  facts  that  de  novo 
CNVs  rates  are  consistently  high  in  SPX  ASD  (5.8%- 10.2%)  versus  familial  ASD  (2-3%),  we 
hypothesize  that  low-activity  MTHFR  677T  allele  leads  to  increase  global  DNA 
hypomethylation  and  consequently  results  in  increased  generation  of  de  novo  CNVs  bringing 
about  a  higher  risk  for  developing  sporadic  cases  of  autism.  We  proposed  to  test  1)  the 
association  of  MTHFR  677T  allele  with  rate  of  ASD  related  de  novo  CNVs;  2)  the  association  of 
of  MTHFR  677T  allele  with  increased  level  of  global  hypomethylation;  and  3)  the  association  of 
level  of  global  hypomethylation  with  increased  rate  of  ASD  related  de  novo  CNVs. 

KEYWORDS 

Autism,  Sporadic  Cases,  MTHFR,  Hypomethylation,  Differentially  Methylated  Regions 
(DMR),  Copy  Number  Variation  (CNV) 

OVERALL  PROJECT  SUMMARY 

This  pilot  project  started  from  September  of  2013.  During  the  over  two  years  of  project 
execution,  we  have  achieved  the  aims  set  in  the  original  proposal.  Although  no  publication  has 
yet  come  out  from  the  study  during  till  today,  the  significant  findings  from  the  study  allow  us  to 
prepare  a  significant  manuscript  that  we  are  aim  at  submitting  to  a  top  journal  during  this  March; 
and  a  follow-up  study  which  was  carried  out  with  Pi’s  other  funding  source  will  very  likely  yield 
another  manuscript  during  the  middle  of  this  year. 

KEY  RESEARCH  ACCOMPLISHMENTS: 

•  Completed  microarray  data  analysis  on  510  SPX  families;  identified  99  individuals 
carrying  pathogenic  CNVs;  confirmed  33  of  these  CNVs  be  de  novo. 

•  510  SPX  families  (both  parents  and  affected  individuals)  were  genotyped  on  MTHFR 
Functional  Polymorphism  C677T  using  TaqMan  Assay. 

•  Both  experimental  and  bioinfonnatics  pipelines  were  established  for  global  Methylation 
profiling  using  MBD-Seq  strategy  with  Ion  Torrent  Proton 

•  31  ASD  cases  with  de  novo  pathogenic  CNVs  and  31  ASD  cases  without  de  novo 
pathogenic  CNVs  were  MBD-Sequenced.  Among  all  62  ASD  cases,  at  MTHFR  C677T 
locus,  35,  23  and  4  carry  CC,  CT  and  TT  genotype  respectively. 

•  The  following  data  analyses  were  perfonned 

1 .  The  MBD-Seq  data  QA/QC:  Data  quality  was  measured  in  a  number  of  ways  to 
ensure  that  the  data  was  of  good  quality  prior  to  analysis.  The  first  was  a  measure 
of  saturation,  with  a  cutoff  minimum  score  of  0.5,  to  detennine  if  the  data  creates 
reproducible  coverage  of  the  reference  genome  The  second  measure  was 
enrichment,  with  a  minimum  cutoff  score  of  1.7,  which  measures  how  well  the 
methylation  capture  method  worked  and  therefore  how  enriched  the  sample  is  for 
methylated  DNA.  The  third  measure  was  5  times  or  greater  coverage,  with  a 
minimum  cut-off  of  5%,  meaning  that  at  least  5%  of  the  sequence  needed  to  have 
at  least  5  times  sequencing  coverage.  The  fourth  measure  was  the  number  of  total 
sequencing  reads  for  a  sample,  with  a  minimum  of  20  million  reads.  The  fifth 
measure  was  the  number  of  unique  reads  for  a  sample,  with  a  minimum  of  15 
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million  reads  as  a  cutoff,  where  a  unique  read  was  defined  as  a  read  that  does  not 
share  a  stop  or  start  point  with  any  other  read.  The  number  of  reads,  both  total 
and  unique,  were  plotted  to  visualize  all  samples  individually  to  ensure  they  met 
this  criteria.  Overall,  all  62  samples  passed  all  five  quality  measurements  and, 
thus,  were  included  in  the  analysis. 

2.  Global  Methylation  Index  (GMI):  The  global  methylation  index  (GMI)  was 
calculated  across  the  genome  for  each  sample.  The  average  GMI  for  each 
intended  comparison  group,  such  as  group  carrying  CNV  and  group  not  carrying 
CNV;  group  carrying  genotype  C/C  and  group  carrying  C/T  and  T/T  genotype  on 
MTHFR  C677T  polymorphism,  was  calculated,  were  compared  between  groups 
using  t-tests.  While  no  significant  difference  was  found  in  global  methylation 
level  between  CNV+  and  CNV-  groups,  we  did  find  the  low  activity  T  allele  ASD 
carrier  for  MTHFR  C677T  variant  has  a  significantly  lower  global  methylation 
level  than  the  CC  homozygous  group. 

3.  Detection  of  Differentially  Methylated  Regions  (DMRs)  between  C/C  and 
C/T +T/T  groups:  The  methylation  level  within  a  200bp  window  was  calculated 
and  nonnalized  to  reads  per  kilobase  per  million  (RPKM).  Following  a 
comparison  of  RPKM  values  in  each  200  bp  window  between  C/C  and  C/T+T/T 
groups  at  p=0.05,  238  differentially  methylated  regions  (DMRs)  were  identified. 
140  DMRs  were  hypennethylated  in  the  C/C  group  relative  to  the  C/T+T/T  group. 
98  DMRs  were  hypennethylated  in  the  C/T+T/T  group  relative  to  the  C/C  group. 
The  RPKM  values  for  the  DMRs  that  were  found  to  be  hypennethylated  in  the 
C/T+T/T  group  were,  generally,  lower  overall  when  compared  to  those  identified 
as  hypennethylated  in  the  C/C  group.  It  is  of  great  interest  that  the  identified 
DMRs  were  not  evenly  distributed  across  chromosomes. 

4.  Most  Significantly:  The  differentially  methylated  regions  (DMRs)  were  found 
to  be  highly  biased  towards  autism  related  genes  and  CpG  islands.  This  may 
imply  a  major  mechanism  for  etiology  of  sporadic  cases  of  autism:  autism  causing 
environmental  factors  serves  as  modulator  to  regulate  a  MTHFR  -mediated 
epigenomics  to  regulate  specific  autism  related  gene. 

CONCLUSION: 

The  proposed  research  aims  for  the  funded  pilot  projects  were  successfully  realized  with 
a  significant  finding  for  a  potential  major  mechanism  for  etiology  of  sporadic  cases  of  autism. 

PUBLICATIONS,  ABSTRACTS  AND  PRESENTATIONS:  N/A 
INVENTIONS,  PATENTS  AND  LICENSES:  N/A 
REPORTABLE  OUTCOMES: 

The  significant  findings  from  the  study  allow  us  to  prepare  a  significant  manuscript  that 
we  are  aim  at  submitting  to  a  top  journal  during  this  March;  and  a  follow-up  study  which  was 
carried  out  with  Pi’s  other  funding  source  will  very  likely  yield  another  manuscript  during  the 
middle  of  this  year. 

OTHER  ACHIEVEMENTS:  N/A 
REFERENCES: 

Jiang  YH,  Sahoo  T,  Michaelis  RC,  Bercovich  D,  Bressler  J,  Kashork  CD,  et  al.  A  mixed 
epigenetic/genetic  model  for  oligogenic  inheritance  of  autism  with  a  limited  role  for  UBE3A. 

Am  J  Med  Genet.  2004;13 1A(1):  1-10. 

Liu  X,  et  al .  Population-  and  family-based  studies  associate  the  MTHFR  gene  with  idiopathic 
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APPENDENCIES 


Findings  stated  in  the  progress  report  and  that  were  to  included  in  the  manuscript 


MBD-Seq  Data  Quality 

Data  quality  was  measured  in  a  number  of  ways  to  ensure  that  the  data  was  of  good 
quality  prior  to  analysis.  The  results  of  these  measurements  for  each  sample  are  summarized  in 
Table  1.  The  first  was  a  measure  of  saturation,  with  a  cutoff  minimum  score  of  0.5,  to  determine 
if  the  data  creates  reproducible  coverage  of  the  reference  genome  (Lienhard  et  a/.,  2014).  The 
second  measure  was  enrichment,  with  a  minimum  cutoff  score  of  1 .7,  which  measures  how  well 
the  methylation  capture  method  worked  and  therefore  how  enriched  the  sample  is  for  methylated 
DNA.  The  third  measure  was  5  times  or  greater  coverage,  with  a  minimum  cutoff  of  5%, 
meaning  that  at  least  5%  of  the  sequence  needed  to  have  at  least  5  times  sequencing  coverage. 
Figure  1  is  an  example  of  data  quality  output,  while  Figure  2  demonstrates  the  distributions  of 
the  quality  metrics  for  each  sample  group. 


Figure  1 :  Example  Quality  Control  Graphs  of  Subject  21 . 

Graphs  created  during  data  quality  control.  A:  saturation  analysis  determines  if  the  data  generated  creates  a 

reproducible  coverage  of  the  reference  genome  i Lienhard  eta/ ,  2014);  B:  representation  of  amount  of  times  (X)  coverage 
achieved  with  data  generated  as  a  percentage  of  all  data  generated.  Total  >=5X  coverage  is  the  summation  percentage 
of  5-5X  and  >5X  coverage 


Figure  1:  Example  Quality  Control  Graphs  of  Subject  21. 

Graphs  created  during  MeDIPs  data  quality  control.  A:  saturation  analysis  determines  if  the  data  generated  creates  a 
reproducible  coverage  of  the  reference  genome  (Lienhard  et  al.,  2014);  B:  representation  of  amount  of  times  (X) 
coverage  achieved  with  data  generated  as  a  percentage  of  all  data  generated.  Total  >=5X  coverage  is  the  summation 
percentage  of  5-5X  and  >5X  coverage. 


Figure  2:  Data  Quality  Metrics  Grouped  by  Genotype 

of  quality  control  measures  of  the  data  grouped  by  'SSZZJP 

genotype  groups  Each  sample  data  needed  to  pass  all  three  thresholds  to  be 
considered  for  analysis  A:  percentage  of  at  least  5  times  coverage  (minimum  = 
5%);  B:  relative  methylation  enrichment  score  (minimum  =  1.7);  C:  saturation 
score  (minimum  =  0.5). 
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Figure  2:  Data  Quality  Metrics  Grouped  by  Genotype 

Boxplots  of  quality  control  measures  of  the  data  grouped  by  rsl  80 1133  (C677T)  genotype  groups.  Each  sample 
data  needed  to  pass  all  three  thresholds  to  be  considered  for  analysis.  A:  percentage  of  at  least  5  times  coverage 
(minimum  =  5%);  B:  relative  methylation  enrichment  score  (minimum  =  1.7);  C:  saturation  score  (minimum  =  0.5). 

The  fourth  measure  was  the  number  of  total  sequencing  reads  for  a  sample,  with  a 
minimum  of  20  million  reads.  The  fifth  measure  was  the  number  of  unique  reads  for  a  sample, 
with  a  minimum  of  1 5  million  reads  as  a  cutoff,  where  a  unique  read  was  defined  as  a  read  that 
does  not  share  a  stop  or  start  point  with  any  other  read.  The  number  of  reads,  both  total  and 
unique,  were  plotted  to  visualize  all  samples  individually  to  ensure  they  met  this  criteria;  this  can 
be  seen  in  Figure  3. 

Overall,  all  62  samples  passed  all  five  quality  measurements  and,  thus,  were  included  in 
the  analysis. 


Figure  3:  Total  Reads  and  Unique  Reads 

At  least  20  mil  reads  were  needed,  with  at  least  15  mil  of  those  reads  being 
classified  as  unique,  to  pass  quality  control. 
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Figure  3:  Total  Reads  and  Unique  Reads 

At  least  20  mil  reads  were  needed,  with  at  least  15  mil  of  those  reads  being  classified  as  unique,  to  pass  quality 
control. 

Global  M ethylation  Index  (GMI) 

The  global  methylation  index  (GMI)  was  calculated  across  the  genome  for  each  sample. 
The  average  GMI  for  each  genotype  group  (C/C,  C/T  and  T/T)  was  calculated,  were  compared 
using  t-tests.  Following  this  preliminary  analysis,  it  was  decided  that  the  C/T  and  T/T  groups 
could  be  combined  into  one  group  for  further  analysis  because  there  was  no  significant 
difference  in  mean  GMI  between  the  C/T  and  T/T  (data  not  shown). 

The  mean  GMI  for  C/C  genotype  group  was  47.22,  with  a  standard  deviations  of  18.41 
and  n=35.  The  mean  GMI  for  C/T+T/T  genotype  group  was  40.22,  with  a  standard  deviation  of 
9.37  and  n=27.  Comparing  the  mean  GMI  between  C/C  genotype  group  and  the  C/T+T/T 
genotype  group  using  a  t-test  showed  that  there  was  a  marginally  statistically  significant 
difference  between  the  groups  (p=0.0569;  Figure  4). 


Figure  4:  Mean  Global  Methylation  Index  (GMJ)  for  Whole  Genome 
Methylation  Comparison  Between  Genotype  Groups 

Samples  were  grouped  into  two  groups:  C/C  genotype  and  genotypes  with  a  T 
present  ( C/T  and  T/T).  A  two-tailed  t-test.  with  independent  samples  and 
unequal  variances  was  run  using  IBM  Statistics  v22  to  compare  the  mean 
global  methylation  index  where  n=35  for  C/C  and  n=27  for  C/X+J/T 
*:  significantly  different.  p=0  0569 
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Figure  4:  Mean  Global  Methylation  Index  (GMI)  for  Whole  Genome  Methylation 
Comparison  Between  MTHFR  C677T  Genotype  Groups 

Samples  were  grouped  into  two  groups:  C/C  genotype  and  genotypes  with  a  T  present  (C/T  and  T/T).  A  two-tailed 
t-test,  with  independent  samples  and  unequal  variances  was  run  using  IBM  SPSS  Statistics  v22  to  compare  the  mean 
global  methylation  index  (GMI),  where  n=35  for  C/C  and  n=27  for  C/T+T/T. 

Analysis  of  Differentially  Methylated  Regions  (DMRs) 

The  methylation  level  within  a  200bp  window  was  calculated  and  normalized  to  reads  per 
kilobase  per  million  (RPKM).  Following  a  comparison  of  RPKM  values  in  each  200  bp  window 
between  C/C  and  C/T+T/T  groups  at  p=0.05,  238  differentially  methylated  regions  (DMRs)  were 
identified.  These  identified  DMRs  are  summarized  in  a  table  not  presented  here.  140  DMRs  were 
hypennethylated  in  the  C/C  group  relative  to  the  C/T+T/T  group.  98  DMRs  were 
hypennethylated  in  the  C/T+T/T  group  relative  to  the  C/C  group.  The  RPKM  values  for  the 
DMRs  that  were  found  to  be  hypennethylated  in  the  C/T+T/T  group  were,  generally,  lower 
overall  when  compared  to  those  identified  as  hypennethylated  in  the  C/C  group  (see  Figure  5). 


Figure  5:  Reads  per  Mjg&a&g  per  Million  (RB&M)  Mean  Comparison 
Between  Genotype  Groups  Across  Differentially  Methylated  Regions 

Comparison  of  mean  {JE&M  values  across  UMJR& when  C/T  +  T/T  genotype 
group  was  clots;  and  when  C/C  genotype  group  mean  was 

(orange  dots). 


Figure  5:  Reads  per  Kilobase  per  Million  (RPKM)  Mean  Comparison  Between  Genotype 
Groups  Across  Differentially  Methylated  Regions  (DMRs) 

Comparison  of  mean  RPKM  values  across  DMRs  when  C/T  +  T/T  genotype  group  was  hypermethylated  (blue  dots) 
and  when  C/C  genotype  group  mean  was  hypermethylated  (orange  dots). 


We  found  that  the  identified  DMRs  were  not  evenly  distributed  across  chromosomes. 
This  phenomenon  was  quantified  by  comparing  the  actual  number  of  DMRs  per  chromosome 
identified  to  a  calculated  number  of  DMRs  expected  for  each  chromosome  based  on 
chromosome  size.  It  was  found  that  16  of  the  24  chromosomes  had  a  significantly  different 
actual  number  of  DMRs  compared  to  the  expected  number  of  DMRs  (see  Figure  f).  Six  of  the 
chromosomes  were  found  to  be  significantly  enriched  for  DMRs  while  10  of  the  chromosomes 
were  found  to  be  reduced  in  DMRs. 


Figure  7:  Genomic  Distribution  of  Differentially  Methylated  Regions  (OMRu) 
The  actual  number  of  QMfj&for  each  chromosome  at  p=0.05  versus  the  number 
of  expected  per  chromosome  if  the  were  distributed  evenly  based 
on  chromosome  size  (hg1.9/GRCh27.  Feb  2009: 

http://www.ncbi. nlm.nih.gov/projects/genome/assembly/gro'human/data/).  There 
was  no  data  available  for  mitochondrial  DNA. 

*:  significantly  different  from  expected.  p<0.05 
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Figure  7:  Genomic  Distribution  of  Differentially  Methylated  Regions  (DMRs) 

The  actual  number  of  DMRs  for  each  chromosome  at  p=0.05  versus  the  number  of  DMRs  expected  per  chromosome 
if  the  DMRs  were  distributed  evenly  based  on  chromosome  size  (hgl9/GRCh37  Feb  2009; 
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/).  There  was  no  data  available  for 
mitochondrial  DNA. 

*:  significantly  different  from  expected,  p<0.05 


Sequenom  Validation 

To  validate  the  findings  by  MBD-Seq,  an  alternate  method  -  Sequenom  MassARRAY  - 
was  chosen  to  analyze  a  selection  of  the  DMRs  identified.  Of  the  59  DMRs  identified  in 
intragenic  regions,  8  of  these  were  selected  for  validation.  Six  of  the  DMRs  were 
hypennethylated  in  the  C/C  group  and  two  were  hypermethylated  in  the  C/T+T/T  group.  The 
data  is  under  analysis. 


